17 research outputs found

    Three real-world datasets and neural computational models for classification tasks in patent landscaping

    Get PDF
    Patent Landscaping, one of the central tasks of intellectual property management, includes selecting and grouping patents according to user-defined technical or application-oriented criteria. While recent transformer-based models have been shown to be effective for classifying patents into taxonomies such as CPC or IPC, there is yet little research on how to support real-world Patent Landscape Studies (PLSs) using natural language processing methods. With this paper, we release three labeled datasets for PLS-oriented classification tasks covering two diverse domains. We provide a qualitative analysis and report detailed corpus statistics.Most research on neural models for patents has been restricted to leveraging titles and abstracts. We compare strong neural and non-neural baselines, proposing a novel model that takes into account textual information from the patents’ full texts as well as embeddings created based on the patents’ CPC labels. We find that for PLS-oriented classification tasks, going beyond title and abstract is crucial, CPC labels are an effective source of information, and combining all features yields the best results

    Evaluating neural multi-field document representations for patent classification

    Get PDF
    Patent classification constitutes a long-tailed hierarchical learning problem. Prior work has demonstrated the efficacy of neural representations based on pre-trained transformers, however, due to the limited input size of these models, using only title and abstract of patents as input. Patent documents consist of several textual fields, some of which are quite long. We show that a baseline using simple tf.idf-based methods can easily leverage this additional information. We propose a new architecture combining the neural transformer-based representations of the various fields into a meta-embedding, which we demonstrate to outperform the tf.idf-based counterparts especially on less frequent classes. Using a relatively simple architecture, we outperform the previous state of the art on CPC classification by a margin of 1.2 macro-avg. F1 and 2.6 micro-avg. F1. We identify the textual field giving a “brief-summary” of the patent as most informative with regard to CPC classification, which points to interesting future directions of research on less computation-intensive models, e.g., by summarizing long documents before neural classification

    On partial encryption of rdf-graphs

    No full text
    Abstract. In this paper a method for Partial RDF Encryption (PRE) is proposed in which sensitive data in an RDF-graph is encrypted for a set of recipients while all non-sensitive data remain publicly readable. The result is an RDF-compliant self-describing graph containing encrypted data, encryption metadata, and plaintext data. For the representation of encrypted data and encryption metadata, the XML-Encryption and XML-Signature recommendations are used. The proposed method allows for fine-grained encryption of arbitrary subjects, predicates, objects and subgraphs of an RDF-graph. An XML vocabulary for specifying encryption policies is introduced.

    Web based visual exploration of patent information

    No full text
    Patents are an invaluable source of scientific and technological information. Due to the strongly increasing number of patent applications and the broadening of the objectives of patent analysis, there is a great demand for ubiquitous access to patent information and for flexible visualizations meeting the requirements of different groups of users. In this paper we propose new visualization techniques for patent information and approaches for interactively exploring this information in web based environments. We show how these visualizations can be integrated into existing web portals by using a new paradigm that we call Semantic Lens.

    Application of Semantic Technologies for Representing Patent Metadata

    No full text
    Abstract: Patents belong to the few types of public information that have a big impact on national and international economies. During the last years there have been great efforts in making patent data available electronically for the public via online services. But today’s services provide heterogeneous data structures which makes automatic processing difficult. None of the services supports all user aspects, so that different services have to be combined. In this paper we present an ontology-based approach for representing patent metadata and describe a Patent Metadata Ontology (PMO) that models the major aspects of patent metadata. The advantage of our approach is to provide a homogeneous representation of patent metadata merged from different sources. It allows for identifying context and dependency information more easily than today’s database-centric structures and interfaces.
    corecore